Proficiency Assessment of ESL Learner's Sentence Prosody with TTS Synthesized Voice as Reference

نویسندگان

  • Yujia Xiao
  • Frank K. Soong
چکیده

We investigate how to assess the prosody quality of an ESL learner’s spoken sentence against native speaker’s natural recording or TTS synthesized voice. A spoken English utterance read by an ESL leaner is compared with the recording of a native speaker, or TTS voice. The corresponding F0 contours (with voicings) and breaks are compared at the mapped syllable level via a DTW. The correlations between the prosody patterns of learner and native speaker (or TTS voice) of the same sentence are computed after the speech rates and F0 distributions between speakers are equalized. Based upon collected native and non-native speakers’ databases and correlation coefficients, we use Gaussian mixtures to model them as continuous distributions for training a two-class (native vs non-native) neural net classifier. We found that classification accuracy between using native speaker’s and TTS reference is close, i.e., 91.2% vs 88.1%. To assess the prosody proficiency of an ESL learner with one sentence input, the prosody patterns of our high quality TTS is almost as effective as those of native speakers’ recordings, which are more expensive and inconvenient to collect.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Aperiodicity Analysis for Quality Estimation of Text-to-Speech Signals

This contribution presents a new approach towards nonintrusive quality assessment of text-to-speech (TTS) signals. Perturbation measures which capture the degree of excitationspecific aperiodicity in voiced speech are investigated concerning their quality implications in synthesized speech. Based on two independent TTS databases for which formal attributebased listening tests have been conducte...

متن کامل

Adapting Prosody in a Text-to-Speech System

The requirements of the evolving information communication technologies (ICT) place new demands on text-to-speech (TTS) systems. The modern high quality TTS system has to be capable of fast and high-quality adaptation to a new language, voice or even expressive speech. Thus adaptation to new voices with different prosodic characteristics is desired. In this chapter a survey of recent and past a...

متن کامل

Improving intelligibility of synthesized speech in noise with emphasized prosody

The performance of current high quality concatenative text-to-speech (TTS) systems is limited under noisy environments. This paper investigates whether or not the intelligibility of synthesized speech in noise can be improved by emphasizing the prosody. Additionally, the paper presents a method that can effectively emphasize the prosody of units in existing TTS databases. The circular linear pr...

متن کامل

Performance Analysis of Text To Speech Synthesis System Using HMM And Prosody Features With Parsing For Tamil Language

This paper describes a Hidden Markov Model (HMM) based (TTS) system and prosody based (TTS) system for producing natural sounding synthetic speech in Tamil language. The (HMM) based system consists of two phases such as training and synthesis. Tamil speech is first parameterized into spectral and excitation features using Glottal Inverse Filtering (GIF). An emotions present in the input text is...

متن کامل

The New Slovenian Text-to-Speech System

Human-computer interaction in a natural language is becoming possible due to rapid development of computer power. While text-to-speech (TTS) systems for major world languages are quite advanced, smaller languages, like our Slovenian language, lack quality TTS synthesis. At the "Jozef Stefan" Institute a system called GOVOREC (SPEAKER) has been developed which is capable of automatic conversion ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017